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AMENDMENT 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 
Listing of Claims: 

1. (Currently Amended) A machine-readable medium having stored thereon executable 
instructions that when executed by a processor, cause the processor to: 

generate frequency vectors for each non-context token in a corpus based upon counted 
occurrences of a predetermined relationship of the non-context tokens to context tokens; and 

cluster the non-context tokens into a cluster tree based upon the frequency vectors 
according to a lexical correlation among the non-context tokens , wherein the cluster tree is used 
in a pattern recognition system . 

2. (Currently Amended) A method of grammar learning from a corpus, comprising: 

generating frequency vectors for each non-context token in a corpus based upon counted 
occurrences of a predetermined relationship of the non-context tokens to context tokens; and 

clustering the non-context tokens based upon the frequency vectors according to a lexical 
correlation among the non-context token s, wherein the cluster tree is used in. a pattern 
recognition system . 

3. (Original) The method of claim 2, wherein the step of clustering further comprises clustering 
the non-context tokens into a cluster tree. 

4. (Original) The method of claim 3 , wherein the cluster tree represents a grammatical 
relationship among the non-context tokens: 
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5. (Original) The method of claim 3, further comprising cutting the cluster tree along a cutting 
line to separate large clusters from small clusters. 

6. (Original) the method of claim 2, wherein small clusters are ranked according to a 

compactness value. 

7. (Original) The method of claim 2, wherein the predetermined relationship is a measure of 

adjacency. 

8. (Original) The method of claim 2, wherein the clustering is performed based oh Euclidean 
distances between the frequency vectors. 

9. (Original) The method of claiml, wherein the clustering is perfonned based on Manhattan 
distances between the frequency vectors. 

I. 0. (Original) The method of claim 2. wherein me clustering is performed based on maximum 
distance metrics between the frequency vectors. 

I I, (Original) The method of claim 2, further comprising nonnalizing the frequency vectors 
based upon a number :of occurrences of the non-context token in the coitus. 



3 

Copied from 1 I initio on 1 0/ 1 2 f 2007 



Application/Control Number: i 0/662.730 Docket No.: 113256-Con-I 

Art Unit: 2626 

12. (Original) The method of claim 2, wherein the frequency vectors are multi-dimensional 
vectors, the number of dimensions being determined by the number of context tokens and a 
number of predetermined relationships of non-context tokens to the context token being counted. 

13. (Currently Amended) A file storing a grammar model of a corpus of speech, created 
according to a method comprising: 

generating frequency vectors for each non-context token in a corpus based upon counted 
occurrences of a predetermined relationship of the non-context tokens to context tokens; 

clustering the non-context tokens: into a cluster based upon the frequency vectors 
according to a lexical correlation among the non-context tokens; and 

storing the non-context tokens and a representation of the clusters in a file for use in a 
pattern recognition system . 

14. (Original) The file of claim 13, wherein the clusters may be represented by centroid vectors. 

15. (Original) The file of claim 13, wherein the predetermined re lationship is adjacency. 

16. (Original) The file of claim 13, wherein the correlation is based on Euclidean distance. 

1 7. (Original) The file of claim 13, wherein the correlation is based on Manhattan distance. 

18. (Original) The file of claim 13, wherein the correlation is based on a maximum distance 
metric. 
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1 9. (Original) The file of claim 13., wherein the frequency vectors are normalized based upon the 
number of occurrences of the non-context token in the corpus. 

20. (Original) The file of claim 13, wherein the frequency vectors are multi-dimensional vectors, 
the number of dimensions. of which is determined by the number of context tokens and the 
number of predetermined relationships of non-context tokens to context tokens. 
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